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Preface 


The mathematical theory of computation has given rise to two important ap¬ 
proaches to the informal notion of “complexity”: Kolmogorov complexity , usu¬ 
ally a complexity measure for a single object such as a string, a sequence etc., 
measures the amount of information necessary to describe the object. Compu¬ 
tational complexity , usually a complexity measure for a set of objects, measures 
the compuational resources necessary to recognize or produce elements of the 
set. The relation between these two complexity measures has been considered 
for more than two decades, and may interesting and deep observations have 
been obtained. 

In March 1990, the Symposium on Theory and Application of Minimal- 
Length Encoding was held at Stanford University as a part of the AAAI 1990 
Spring Symposium Series. Some sessions of the symposium were dedicated to 
Kolmogorov complexity and its relations to the computational complexity the¬ 
ory, and excellent expository talks were given there. Feeling that, due to the 
importance of the material, some way should be found to share these talks 
with researchers in the computer science community, I asked the speakers of 
those sessions to write survey papers based on their talks in the symposium. In 
response, five speakers from the sessions contributed the papers which appear 
in this book. 

In this book, the main topic is Kolmogorov complexity and its relations to 
the structure of complexity classes. As I explain in the Introduction, each paper 
discusses a different type of Kolmogorov complexity, and each paper uses a dif¬ 
ferent viewpoint in developing a relationship between Kolmogorov complexity 
and computational complexity. Thus, this book provides a good overview of 
current research on Kolmogorov complexity in structural complexity theory. 

I whish to thank Dr. Edwin Pednault, the Chair of the Symposium, for 
having organized the interesting sessions from which this book originated. Each 
paper was reviewed by some outside reviewer as well as by fellow authors. I 
would like to thank the outside reviewers, Professor Jose Balcazar, Professor 
Kojiro Kobayashi, and Professor Keri Ko, for their constructive comments. 

Osamu Watanabe 
May 1992 
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1 Complexity, Entropy, and Randomness 

Things can be large or small, and their size (the length or the volume or the 
weight or so on) can be measured by a number. Besides, things can be simple 
or complex, and their complexity can also be measured by a number. I do not 
know to whom we are indebted for measuring sizes by numbers. It was Andrei 
Kolmogorov [Kol65] who proposed to measure the complexity of a thing by a 
natural number (i.e., a non-negative integer), and he developed the rudiments 
of the theory. 

Complexity of things (as opposed to the complexity of processes, e.g., of 
computational processes) took the name descriptional complexity , or Kolmogorov 
complexity . As will be seen here, in appropriate cases one may say “entropy” 
instead of “complexity”. 

* Preparation of this paper was supported in part by the Institute of New Technologies 
at Moscow. 
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Thus we assume that there is a set Y of things, or objects, y' s, and a total 
function “complexity of y ” defined on Y. That function will be denoted by Compl 
and its possible values are 0,1,2,3,n,oo. So the function Compl is a total 
function from Y to 3N U {oo}. We do not put any further restrictions on Compl, 
but take it on an intuitive level as a measure of complexity, or a complexity 
function, or, shorter, a complexity. 

Let Compl! and Compl 2 be two measures of complexity. Let us say that 
Compl! is not worse than Compl 2 if 

Compl x (y) < Comply). 

w 


Explanation. The notation A(y) < B(y) means that for some constant c not 

$ 

depending on y and for all y , A(y) < B(y) + c holds. 


Let Z be some class of complexities, or (that is the same) of measures of 
complexity. Let Compl 0 , belonging to Z, be not worse than any complexity 
belonging to Z. Then Compl 0 is called optimal in the class Z. So a way of 
measuring complexity is called optimal if it gives, roughly speaking, the shortest 
complexities of things. Of course, a class of complexities may have no optimal 
one. 

Any optimal complexity is called an entropy . It is possible that a class Z 
has several entropies, but any two entropies, Enti and Ent 2 , fulfill the following 
condition: 


Ent iiy) = Ent 2 (y) 

w 

Explanation. The notation A(y) = B(y) means that \A(y) - B(y)\ < 0, or 

A{y) < B{y) and B{y) < A(y). 

W W 

Important Remark. There is no semantic problem when one speaks about an 
entropy related to some class of complexities. But in the theory of Kolmogorov 
complexity it is usual to speak about the entropy and even to denote it by a 
special notation. What does it mean? Here we have an abus de langage (after 
N.Bourbaki). Speaking about the entropy related to some class, one speaks in 
fact about an arbitrary entropy of that class. And the notation denotes any of 
such entropies. Of course, our statements must be invariant and do not change 
their truth value when a particular entropy changes to another one but still 
belonging to the same class. But we must be cautious. Let V and W be two 
classes of complexity functions, and let K be the entropy related to V and L be 
the entropy related to W. In fact, K and L denote two families of entropies, or, it 

is better to say, any entropies of those two families. When we write K(y) < L(y), 

Q 

we suppose that this relation < holds for any particular entropy denoted by K 

and any particular entropy denoted by L (so there is an additive constant hidden 
in this relation depending on the choices of particular representatives of K and 
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L). But when we declare that K and L coincide (are the same entropy), we do 
not want to express the opinion that any entropy denoted by K coincides with 
any entropy denoted by L. That is, we understand the coincidence statement in 
the following way: for any of the entropies K and any of the entropies L, there 
exists a constant c such that | K(y) - L(y) | < c for all y. 

Terminological Remark. In literature on Kolmogorov complexity, the term 
‘‘complexity’ 1 (synonymous with “complexity function” and “measure of com¬ 
plexity”) is most often used in the sense of the term “entropy”. But we make 
the distinction between those two terms: entropy is an optimal complexity. 

As has already been said, there may be no entropy among complexity func¬ 
tions belonging to a class Z. An important property of a class Z is that of having 
an entropy. In such case, we say that the Solomonoff-Kolmogorov Theorem holds 
for Z. 

There exist several important classes of complexities that contain entropies. 
And among those entropies, there are ones of special interest — namely, those 
entropies that can be used for a definition of randomness. Kolmogorov has pro¬ 
posed the following definition of randomness for an infinite binary sequence 
aia 2 ci 3 • * • a n • * •: the sequence is called random , or, more exactly, Kolmogorov 
random , if 

Ent(ai • • * a n ) > n, 

Q 

where Ent is an entropy. Of course, the choice of Ent is to be specified. Not 
every sort of entropy goes to a “good” definition of randomness; a definition by 
Kolmogorov scheme is regarded as “good” if the class of Kolmogorov random 
sequences sprung up by that definition coincides with the class of sequences that 
are random in the sense of Martin-Lof (or are typical sequences ; see [KU87a] and 
[KU87b] and [Mar66]). 

To sum up: in order to define an entropy, one must define ail appropriate 
class of complexities and show that the Solomonoff-Kolmogorov Theorem holds 
for that class. 


1.1 Generation of Complexities by Means of an Encoding Procedure 

The idea (due to Kolmogorov) is very simple. There are objects and there are de¬ 
scriptions (encodings) of objects, and the complexity of an object is the minimal 
size of its description. 

In more detail, there is a set Y of objects y , and a set X of descriptions 
(names, encodings) x. There is a volume function l defined on X\ that l is 
a function from X to IN. A mode of description , or a description mode , is an 
arbitrary set E C X x Y. If (x, y) € E, then x is called a description (a name, an 
encoding) of y with respect to E. Thus an object y may have many descriptions 
and a description may serve as a description for many objects. 
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The complexity ofy with respect to a description mode E is defined as follows: 

Complj s (y) = min{£(x) : (x,y) € E}. 

We make the convention that Comply (y) = oo if there is no x such that (x, y) £ 
E. 

Let £ be a class of modes of description. Each mode E £ £ gives the corre¬ 
sponding complexity function Comply. Then there arises the class Z = Z(£) of 
all complexity functions related to the inodes of £, and one may ask whether the 
class Z contains an optimal function, or an entropy. If such an optimal function 
exists, then it corresponds to some description mode which is also called optimal. 

Until now we have not imposed any restrictions on X , K, E. It is reasonable 
to assume that X and Y consist of constructive objects , and E is a generable set 
(in the sense of Post) and, consequently, is a recursively enumerable set. 

In the following exposition we shall restrict ourselves with the following sim¬ 
ple case: Both X and Y are S, where 2 is the set of all binary words, or finite 
binary sequences. The volume function t is defined to be £(£) = |£| for every 
£ £ 2, where |£| is the length of £. 


1.2 Two Symmetric Relations and Four Entropies 

Our task is to define.in a reasonable way—a class £ of modes of description 

as a class of subsets of 2 x 2. Having this goal in mind, we define a binary 
relation on 2 which we shall call the concordance relation : u\ and U 2 are called 
concordant if they have a mutual continuation, i.e., if there are ti,t 2 € 2 such 
that u\ti = u 2 t> 2 • This concordance relation will be denoted by 7 . 

Thus we have two natural binary relations on 2: the equality = and the 
concordance 7 . Both are symmetric and decidable (see Explanation in Sect. 1.3). 

Let a and ft be two binary relation on 2. We say that a mode of description 
E C 2 x 2 fulfills the (a, ft)-property if for every x\,x 2 ,yi,y 2 € 2, 

{xi,yi} £ E A (x 2 ,y 2 ) € E A x x ax 2 =* yifty 2 - 

Let us consider the class £ = £(a,/ 3) of all recursively enumerable modes of 
description that fulfill (a, ft )-property and the related class Z a ^ = Z{£) of 
complexity functions. If the class Z a 'P contains an optimal complexity function, 
that complexity function will be called (a, ft)-entropy. 

Now move from variable a and ft to constants = and 7 . Taking = or 7 as 
a or ft , we obtain four classes of complexity functions: Z ~'~, Z~' y < £ 7,= , £ 7,7 . 
For each of these four classes, the Solomonoff-Kolmogorov theorem is valid, so 
we have four entropies: 

1 . (=, =)-entropy, or ININ-entropy, 

2 . (=, 7 )~entropy, or INir-entropy, 

3. (7, =)-entropy, or iTlN-entropy, and 

4. ( 7 , 7 )-entropy, or SE-entropy. 
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Note. The notations ININ, etc., have the following origin. In [US81] and [US87], 
the notation “S” had the following meaning; the set of all binary words being 
considered together with relation 7 . In place of the set of all binary words with 
the relation = 011 that set, the set IN of all natural numbers with the relation 
= and the volume function l(x) = [log 2 (x -f 1)J was considered. This volume 
function is induced by the following 1-1 correspondence between IN and E: zero 
~ yl, one ~ 0 , two ~ 1 , three ~ 00 , four ~ 01 , five ~ 10 , and so on. 

1.3 Two Approximation Spaces and Four Entropies 

There is another way to come to the four basic entropies of Sect. 1 . 2 . 

Any set of constructive objects with a decidable partial ordering defined on 
that set will be called an approximation space. 

Explanation. The term “decidable” means that there is an algorithm to decide 
for any x f and x ", whether x f < x" or not. 

O 11 an intuitive level, the elements of an approximation space can be taken 
as informations, and x f < x n means that the information x” is a refinement of 
the information x! (and hence x u is closer than x! to some limit value to which 
both x f and x n serve as approximations). 

To develop a more attractive theory of approximation spaces, especially with 
the intention to apply this theory to an advanced theory of Kolmogorov com¬ 
plexity, one needs to include some additional requirements into the definition of 
an approximation space. For our goals, however, it suffices to have a decidable 
partial ordering. Moreover, only two approximation spaces will be considered: 
the bunch IB and the tree T. Their definitions follow immediately. 

- The bunch IB: The set of objects is S’, and the partial ordering < is =, i.e., 
u < w iff u = w. 

- The tree T: The set of objects is E. The partial ordering < is defined as 

follows: u < w iff u is a prefix of w (and w is a continuation of u), i.e., 

3v[uv = w]. 

Let X and Y be two approximation spaces. The spaces X and Y will be 
treated, respectively, as the space of descriptions (names, encodings) and as the 
space of the objects described (named, encoded). Our near goal is to define the 
class £ of acceptable description modes E C X x Y. 

We impose on E the following three requirements: 

1 . if (x,y) £ E and x ' > x , then (x\y) £ 25, 

2 . if (x,y) £ E and y f < y , then (x,y f ) € E , and 

3. if £ E and (x,^) € 25, then there exists a y that (x,y) € E, y 1 < y, 

and y 2 < y. 


Hence, the only cause of the existence of two different objects having the same 
description is the execution of the first requirement. 
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A description mode is called acceptable if it is (recursively) enumerable and 
fulfills all the above requirements. 

If one wishes to relate a complexity function with any description mode, one 
needs to introduce a volume function t defined on X. Here we are interested in 
the cases X = IB and X = T only. In both these cases, we put t(x) = |xj where 
|x{ is the length of x. 

Now let us fix approximation spaces X and Y , and let us consider the class 
of all acceptable description modes and the corresponding class of complexities. 
Let us ask whether the SolomonofF-Kolmogorov theorem holds for that class, 
and, if it does hold, then the related entropy will be called XY entropy. 

It turns out that the SolomonofF-Kolmogorov theorem is valid for four cases 
when X and Y is respectively IB or T: 

1 . For X — IB and Y = IB, we have BB-entropy, 

2. For X = IB and Y = T, we have BT-entropy, 

3. For X = T and Y = IB, we have TB-entropy, and 

4. For X = T and Y = T, we have TT-entropy. 

It is easy to see that IBIB-, IBT-, TIB-, TT-entropy respectively coincides with 
(=,=)-, (=,7)-, (7,=)., ( 7 , 7 )-entropy of Sect. 1 . 2 . Speaking 011 the coincidence, 
take into account the Important Remark of Sect.l. 


1.4 The Ordering of the Four Entropies 

Now we have four entropies, and any two of them, A and B, do not coincide; 

which means that the assertion A(y) = B(y) is not valid. Let us write A < B if 

Q 

A(y) < B(y) but not vice versa. Then there is a partial ordering 011 the set of 
0 

four entropies. That ordering can be shown by the following picture; Fig.l. 


The picture is directed from bottom to top. That is, it shows that 

IBT < BIB, BT < TT, BB < TB, TT < TB, 

and, of course, 

BT < TB. 

On the other hand, 

neither BB < TT nor TT < BB. 


1.5 Encoding-Free Generation of Complexities and Entropies 

It turns out that the four entropies of Sect. 1.4 admit an encoding-free definition 
with no use of such terms as “descriptions”, “names”, or “encodings”. 

As before and always, an entropy is defined as an optimal complexity function 
for some class Z of complexity functions; and all members of Z are functions 
from Y to IN U {oc}. So our goal is to describe appropriate classes Z. 

Having this goal in mind, let us introduce two conditions, C and 17, which 
could be imposed on a function f: Y —► IN U { 00 }. 



Complexity and Entropy 


TIB, 

or ( 7 ,=) 



BT, 

or (=, 7 ) 

Fig. 1. The ordering of the four entropies: TIB, IB IB, TT, IBT 

Condition C (of Cardinality of a set). Let n 6 N, and let M C Y be an arbi¬ 
trary set such that 

1 , any two elements of M are non-comparable, and 

2 . M C f- l {n). 

Then the cardinality of M is less than or equal to 2 n . 

Condition S (of summation of a series). Let M C Y be an arbitrary set such 
that any two elements of M are non-comparable. Then 

2 ~ f(v) < 1 . 

Explanation. Elements y\ and 2/2 are non-comparable if neither y\ < 7/2 nor 
V2 < 2/1 • 

Thus, an arbitrary function / may or may not satisfy Condition C or Con¬ 
dition And it is easy to see that Condition E implies Condition C. 
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Further, a definition of ''enumerability from above” is to appear. A function 
/ : Y —♦ IN U { 00 } is called enumerable from above if the set { {y, n) : y € K, n € 
IN,/(y) < n} is enumerable, that is, recursively enumerable. 

Let us denote by 2(0, Y) the class of all functions from Y to IN U { 00 } that 
are enumerable from above and satisfy the condition □, where □ is either C or 
X. Any element of 2(0, Y) may be called a O-acceptable complexity . Hence, we 
have four classes of acceptable complexities: 2(C, IB), 2( C,T), 2(2, IB), and 
2(2, T). For each of these four classes, there holds the Solomonoff-Kolmogorov 
theorem. 

Thus, there are four entropies: CB-entropy, CT-entropy, XB-entropy, and 
ITT-entropy. 

If one imposes the ordering on these four entropies, as in Sect. 1.4, then one 
obtains the following picture; Fig.2. 


rB 



ST 


Fig. 2. The ordering of four entropies: X*B, CB, XT, CT 


The four entropies of this section also admit definitions with slightly modified 
versions of conditions C and X. 


Condition CL There exists a constant b such that the cardinality of M is less 
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than or equal to 6 • 2 n for every M CY satisfying the requirements (i) and (ii) 
of Condition C. 

Condition E\ There exists a constant 6 such that 

^ 2 -/ly) < 6 

yeM 

for every M CY of mutually non-comparable elements. 

Classes Z( C',B), Z(C f , T), Z(E\TR), and Z(E f , T) differ from the corre¬ 
sponding classes Z{C, IB), Z(C, T), Z(E, IB), and Z{E, T); nevertheless, the 
related entropies coincide in the sense of Important Remark of Sect.l. That is, 

C'B = CB, C'T = CT, E'B = £B, and S 'T = ST. 

Condition S °°. For an arbitrary set M C Y of mutually non-comparable ele¬ 
ments, 

y] < -f oo. 

yeM 

It is obvious that Condition S°° is equivalent to Condition E f for Y = B. 
Hence, the Z'^B-entropy coincide with the i^'B-entropy and consequently with 
the JCB-entropy. 

Theorem (Andrei Muchnik). The conditions E* and E 00 are equivalent in 
the case Y = T. 

Hence the I7°°T-entropy coincides with the Z^T-entropy and with the ET - 
entropy. 


1.6 Relations between Two Quadruplets of Entropies 

Now we have two quadruplets of entropies: the quadruplet BB, BT, TB, and 
TT, which respectively generated by means of encoding, and the quadruplet 
CB, CT, JCB, and ET, which respectively generated by using some quantitative 
approach represented by conditions C and E . 

It turns out that (in the sense of the equality explained in Sect.l, Important 
Remark) the following relations hold: 

CB = BB, CT = BT, and ETB = TB. 

As to ET, we have the following non-trivial fact, which will be discussed in 
Sect.2.2(5): 


ET < TT. 


Summarizing them, we obtain the following picture; Fig.3. 
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TIB = XIB 



IBT = CT 

Fig. 3. The relation between entropies 


1.7 A Semantic for XT-Entropy 

Four entropies of Fig.3 have an encoding semantic, but the fifth entropy, XT, 
has not yet obtained an appropriate semantic. Now a semantic for XT will be 
set forth. That semantic is based upon probabilistic machines. 

To this end let us consider a probabilistic Turing machine with one-way 
infinite output tape whose head moves in only one direction. “Probabilistic 11 
means that one must flip a symmetric coin before performing any command, and 
the result of flipping determines which command is to be performed. Another 
version: at the input tape, there comes a random infinite binary sequence with 
equal probabilities of digits. We suppose that our machine has binary output 
alphabet and never stops, so a finite or infinite binary sequence appears on the 
output tape. 

Let us fix a machine M. For any y € X, let us denote by P \f(y) the probability 
of the event ‘y is the beginning of the output sequence 1 ; in this notation, “n” 
stands for “non-stop”. Consider a preorder relation < on the set of machines: 
M < N means that PWy) <n(v)- 
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Explanation. A{y) < B(y) means that for some constant c not depending on 
y , and for every y , A{y) < c • B(y). 

It turns out that there exists a maximal machine W such that M < W for 
every machine M . In fact, there are several such machines, but any two of them, 
U and V, satisfy the condition 


p&(y) = PMy). 

Explanation. A(y) = B(y) iff A(y) < B(y) and B(y) < A(y). 

Hence for any two maximal machines U and V. we have 

|log2 PMlf)! = ! lo S2 Pv(»)l- 

So we have moved from the probability P^r to its logarithm. For any maximal 
machine W one can verify that 


|l°g 2 Pfr(v)l = £T(») 

This fact enables us to identify |log 2 P^| (or, if you prefer, the integer 
U log 2 P^|J) with JET. (Recall again Important Remark of Sect.l). Then the 
probabilistic definition of P^ just given can be taken as a semantic for ST. 

The probability P^(t/), related to an arbitrary maximal machine W , can be 
called a priori probability of y as an element of the tree T. 

Remark. There exists also the a priori probability of y as an element of the 
bunch IB. To obtain the a priori probability of that second sort, one should 
consider probabilistic machines of a slightly different type. The change is: instead 
of machines that never stop, one should take now probabilistic machines that 
can stop. Then P s M {y) is, by definition, the probability of the event ‘the word 
printed on output tape after machine M stops coincides with y here “s” stands 
for “stop”. A preorder on machines and the notion of a maximal machine are 
defined as above, and maximal machines do exist. Then Pfv(l/) calculated for 
an arbitrary maximal machine W is the a priori probability of y as an element 
of IB. Here it occurs that 


|iog 2 PV(»)I = £©(</)• 

Hence SJB has a probabilistic semantic too. But, since JCIB = TIB, the entropy 
JCIB has also an encoding semantic. 
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1.8 Historical, Bibliographical, and Terminological Remarks; 
Acknowledgments 

We begin the history of the theory of Kolmogorov complexity with Kolmogorov’s 
paper [Kol65]. The purpose of that paper was to bring the notion of complexity 
(now we should say “of entropy”) to the foundations of information theory. In 
his paper Kolmogorov expounded some results of his studies of 1963-1964. In 
those years he knew nothing about the paper [Sol64] in which Ray Solomonoff 
presented some similar ideas — but in vague and rather non-mathematical man¬ 
ner. We place the paper [Sol64] in the prehistory of the theory of Kolmogorov 
complexity. At the early stage of the theory’s development, an important role 
belonged to the paper [ZL70]. 

In the papers of pioneers of the theory, there were introduced all five basic 
entropies of our Sect. 1.6. The authors gave them various names and various 
notations. What was common in all those notations was the use of the letter 
“K” or the letter “k” as a part of the notation; one should believe the cause 
of this usage is a homage to Kolmogorov. Here we try to set some system of 
names and notations with the observance of the historical tradition. (In such a 
way the author makes his own contribution to the existing chaos of names and 
notations. This contribution is not too great because some names and notations 
are already in use. Simultaneously the author expresses the hope to introduce a 
standard system.) 

We would like to fix the following names and notations for the five basic 
entropies. 

1. For IBIB-entropy. Name: simple entropy; notation: KS. 

2 . For IBT-entropy. Name: decision entropy; notation: KD. 

3. For TIB-entropy. Name: prefix entropy; notation: KP. 

4 . For TT-entropy. Name: monotonic entropy; notation: KM. 

5. For i7T-entropy. Name: a priori entropy; notation: KA. 

The entropies should be attributed to the following authors: 

- simple entropy KS to Kolmogorov [Kol65](§3) and also (though in some 
nebulous form) to Solomonoff [Sol64], 

- decision entropy KD to Loveland [Lov69], 

- a priori entropy KA to Levin [ZL70](n° 3.3) and [Lev73], 

- monotonic entropy KM to Levin [Lev73], and 

- prefix entropy KP to Levin [Lev76]. 

Remark. Strictly speaking, we denote by the symbols KS, KD, KA, KM and 
KP exactly those versions of entropies as they were formulated by Kolmogorov, 
Loveland, and Levin. Let us recall the Important Remark of Sect.l. The co¬ 
incidence KS with IBIB has the following meaning: for any particular entropy 

KS and for any particular entropy IBIB, there holds KS (y) = BIB(t/). The other 

Q 

coincidences, KD with IBT, etc., are to be understood in the same way. 
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Attributing the entropies to their inventors, we make no claim about the us¬ 
age of these notations by the inventors. None of them made any essential use of 
the term “entropy’'; usually the term “complexity” was used. Kolmogorov used 
simply the word “complexity” with no adjective. Loveland used the term “uni¬ 
form complexity”, and it was renamed as “decision complexity” by Zvonkin and 
Levin [ZL70](Definition 2.2). Levin used the words “monotonic complexity” and 
“complexity related to a prefix algorithm”. He had not introduced any name for 
KA, but used terms “universal semicomputable measure” (in [ZL70](n° 3.3)) and 
“a priori probability” (in [Lev73]) for related quantities of which the logarithm 
is to be taken. 

As to notations, Kolmogorov in [Kol65](§3) employed the notation K^(^) for 
the simple entropy. Loveland in [Lov69](p.513) employed the notation K^(a: n ; n) 
for the decision entropy; and Zvonkin and Levin used for it the notation KR 
[ZL70](Definition 2.2). Zvonkin and Levin in [ZL70](n° 3.3) employed the nota¬ 
tion — log 2 R{/ 1 I } for the a priori entropy; later, in [Lev73] that entropy was 
denoted by Levin as kM. In the same paper [Lev73] the notation km was used 
for the monotonic entropy. The notation KP (for the prefix entropy) appeared 
in [Lev76]. 

The general idea of an approximation space as a space of informations re¬ 
fining, or exactifying, one another is, without doubt, due to D. Scott. This idea 
was embodied into the notion of /o-space in the sense of Yu. Ershov. A classi¬ 
fication of entropies on the basis of that notion is given in [She84](Theorem 8); 
the classification of our Sect.1.3 is very close to that of [She84]. The general idea 
of the encoding-free approach to entropies (see Sect. 1.5 above) was laid down in 
[Lev76]. 

A very useful exposition of various entropies and their interrelation is given 
in [Vyu8l]. A survey of the use of entropies in a definition of randomness is 
presented in [KU87a] and [KU87b]. 

In the process of preparing this paper, the author had many discussions 
with Andrei Muchnik, Alexander Shea’, and Nikolai Vereshchagin. The author 
enjoyed their advice and help. Many final formulations emerged from those 
thankworthy discussions. The bounds of Sect.2.1 and of Sect.2.2 probably be¬ 
longs to what is called “mathematical folk-lore”, but the final formulae are also 
due to discussions with Muchnik, Shell’, and Vereshchagin. 

To conclude this section let us redraw Fig.3 in terms and notations that we 
accept as standard; Fig.4. 

The pentagon of Fig.4 shows, in particular, that neither KA < KS nor KS < 
KA. The exclamation note attached to an entropy means that the entropy can 
be used in the Kolmogorov definition of randomness. 

2 Quantitative Analysis on Entropies 

2.1 Bounds for Entropies 

Some upper and lower bounds for entropies will be written down in this section. 
But first of all the reader must be warned that in this exposition, the sense of an 
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prefix entropy (!) 


KP 



KD 


Fig. 4. Five basic entropies 


upper bound and the sense of a lower bound are rather different. An upper bound 
for an entropy shows that the entropy cannot be too large. A lower bound for an 
entropy does not show that the entropy cannot be too small but does show that 
in infinitely many instances, the entropy can be large enough. So upper bounds 
are absolute, or strong, upper bounds. Lower bounds are not absolute; we shall 
call them weak lower bounds. A weak lower bound has the purpose of supporting 
the corresponding upper bound and demonstrating, in a favorable case, that the 
upper bound cannot be improved. 

After this warning let us consider the five basic entropies of Sect. 1.8, Fig.4. 
(1) Entropies KS, KM, KA, and KD. 

Let Ent denotes one of the entropies KS, KM, KA. KD. Then (an upper 
bound) for all y (in E), 


Ent(y) < |j/|, 


and (a weak lower bound) for infinitely many y (in E), 
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Ent(y) > |y|. (1) 

Let us formulate the lower bound of (1) more exactly. We have four cases: Ent = 
KS, Ent = KM, Ent = KA, Ent — KD. For each of these cases, the symbol 
Ent (as well as KS, KM, KA or KD) denotes an arbitrary function belonging to 
some collection, i.e., the collection of Ent-e ntropies. In each case the meaning of 
(1) is as follows: for any particular function Ent of that collection, there exist a 
constant c, perhaps negative, and an infinite set M C E such that 

Vj/ € M [ Ent(y) > |y| +c ]. 


(2) Entropy KP. 

It is helpful to introduce some notation. A function qlog, quasilogarithm, is 
introduced by the following definition: 


qlogz 


_ / log 2 z,z>l, 


\ 0, z < 1. 

The iterations of that function are defined as follows: 

qlog (1) z = qlogz, and qlog (fc+1 ^z = qlog(qlog (fc) z). 
Then we have for any k , any e > 0, and all y , 


KP(y) < |y|+qlog (1) |y|+qlog (2) |y| + ---4-qlog (fc x) |y| + (1 + e)qlog (fc) |j/|. (2) 

w 

It is an upper bound for KP. 

For a weak lower bound, let k be an arbitrary positive integer. Then, for 
infinitely many y , 

[ KP(y) > |j/| + qlog (1) |y|+qlog (2) |j/| +-h qlog (fc) |y| ]. (3) 

That means that inequality (3) holds for any function denoted by KP (i.e. for 
any prefix entropy) and for an appropriate infinite set of y' s, depending on the 
choice of that function. 

Now it is reasonable to introduce an abbreviation for the sum qlog (1) zH-h 

(1 + e)qlog (fc ^. Let us take, e.g., Qk(z^e) as such an abbreviation. That is, 

Qk(z,e) = qlog (1) z + qlog (2) z -4-b qlog ( * -1 *z +(14* €)qlog (fc) z. 

Then (2) and (3) can be rewritten as follows: 

Vy [KP(y) < |y| + Qfc(|y|,€) ], 

and 

for infinitely many y [ KP(y) > \y\ + Qfc(|y|,0) ]. 
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2.2 Bounds for Differences of Entropies 


By how much can two entropies of different sorts, e.g., KP and KD, can differ 
from one another? Perhaps it is better to ask, how much can one entropy exceed 
the other? Upper and lower bounds are to give the answer. The warning of the 
beginning of Sect.2.1 about the different meanings of upper and lower bounds is 
valid here and now too. 

When A(y) < B(y ), an upper bound for the difference A(y) - B(y) is trivial; 
Q 


namely, a constant. So in this section, only differences A(y) - B(y) for which the 
assertion A(y) < B(y ) is false will be studied. 


Now let us proceed to the differences. 


(1) Difference KP — KD. 

For any fc, any e > 0, and all y, 


my) - my) < qiog|y| + «)• ( 4 ) 

w 

For any fc and infinitely many y , 

KP(y) - KD(y) > qlog|y| + Q h (\y\, 0). (5) 

Note. Let us not forget that an additive constant implied in (4) depends not 
only on fc and € but also on the particular versions of KP and KD. In (5) the set 
of y’s depends not only on fc but also on the versions of KP and KD. This point 
is valid for further inequalities related to lower and upper bounds. 


(2) Differences KS - KM and KS - KA. 

Let \y\ 7 *^ 0. Then, for all y , 

KS(y) - KU(y) < KS (y) - KA (y) < log 2 |y|. 
(+1 w 

For some c and for infinitely many y, 

KS(y) - KM(y) > log 2 |»|+c], 
and for some c and for infinitely many 7/, 

KS(t/) - KA(y) > log 2 \y\ + c. 

(3) Differences KM - KS and KA - KS. 

For any fc, any e > 0, and all y , 

KM( y )-KS(iO<:g*(|»|,e), 

KA(y)-KS(y)<Q k (\y\,e). 
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For any k and infinitely many y, 

KM(y)-KS(y)>Q k (\ylO), 

KA(y)~KS(y)>Q k (\y\,0). 

(4) Differences KP-KS, KS-KD, KP-KM, KP-KA, KM-KD, and KA-KD. 

Let B — A be any of the six entropy differences mentioned above. For any k, 
any e > 0, and all y, 


B(y)-A(y) < Q*(|y|,e). 

And for any k and infinitely many y , 

B{y) - A(y) > Qj;(|y|,0). 

(5) The Difference KM - KA. 

This difference is of special interest. The very fact that the conjecture KM(y) 

= KA(y) is false is disappointing. The refutation of that conjecture is due to 
© 

Peter Gacs [Gac83]. (The Hungarian surname “Gacs” is to be pronounced as 
English “garch”.) 

Both KM and KA are defined on the binary tree T. Gacs studied two en¬ 
tropies K and H of similar sorts; but his K and H are defined not on T but on 
the tree consisting of all words in a countable alphabet (say, in IN if one takes IN 
as an alphabet). Some bound for the difference K — H is stated in Theorem 1.1 of 
[Gac83]; there the author writes: “Therefore for binary strings, the lower bound 
obtainable from the proof of Theorem 1.1 is only the inverse of some version 
of Ackermann’s function” [Gac83](p.75). As it is known, Ackermann’s function 
is a function from IN to IN which exceeds in its growth any primitive recursive 
function. The inverse /~ 1 for a function / is defined as follows: 

f~ l (a) = min{z : f(z) > a }. 

Thus, for infinitely many y, 

KM( 3/ )-KA(3/)>/- 1 (|j/|), (6) 

where / is the version of Ackermann’s function mentioned by Gacs. 

Let Z (y) denote the number of zeros in the word y. Then, as a corollary of 
Theorem 1.1 of [Gac83], we have the following: for any k and for any ?n, there 
exists a y € S such that Z (y) > m and 

KM(y) - KA(y) > Qk{Z(y),Q). (7) 

Therefore, we have two weak lower bounds. As to upper bounds, there is 
known no one except the following trivial one: for any fc, any e > 0, and all y , 

KM(y)-KA (y) <Qk(\y\,e). (8) 

Since the weak lower bounds (6) and (7) do not support the upper bound 
(8), the task of improving all those bounds is open. 
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